image annotation
Augmenting Image Annotation: A Human-LMM Collaborative Framework for Efficient Object Selection and Label Generation
Zhang, He, Fu, Xinyi, Carroll, John M.
Traditional image annotation tasks rely heavily on human effort for object selection and label assignment, making the process time-consuming and prone to decreased efficiency as annotators experience fatigue after extensive work. This paper introduces a novel framework that leverages the visual understanding capabilities of large multimodal models (LMMs), particularly GPT, to assist annotation workflows. In our proposed approach, human annotators focus on selecting objects via bounding boxes, while the LMM autonomously generates relevant labels. This human-AI collaborative framework enhances annotation efficiency by reducing the cognitive and time burden on human annotators. By analyzing the system's performance across various types of annotation tasks, we demonstrate its ability to generalize to tasks such as object recognition, scene description, and fine-grained categorization. Our proposed framework highlights the potential of this approach to redefine annotation workflows, offering a scalable and efficient solution for large-scale data labeling in computer vision. Finally, we discuss how integrating LMMs into the annotation pipeline can advance bidirectional human-AI alignment, as well as the challenges of alleviating the "endless annotation" burden in the face of information overload by shifting some of the work to AI.
- North America > United States > New York > New York County > New York City (0.06)
- North America > United States > Pennsylvania > Centre County > University Park (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (3 more...)
- Research Report (0.66)
- Workflow (0.56)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)
Deep Representations and Codes for Image Auto-Annotation
The task of image auto-annotation, namely assigning a set of relevant tags to an image, is challenging due to the size and variability of tag vocabularies. Consequently, most existing algorithms focus on tag assignment and fix an often large number of hand-crafted features to describe image characteristics. In this paper we introduce a hierarchical model for learning representations of standard sized color images from the pixel level, removing the need for engineered feature representations and subsequent feature selection for annotation.
- North America > Canada > Alberta > Census Division No. 11 > Edmonton Metropolitan Region > Edmonton (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Understanding Subjectivity through the Lens of Motivational Context in Model-Generated Image Satisfaction
Dutta, Senjuti, Chen, Sherol, Mak, Sunny, Ahmad, Amnah, Collins, Katherine, Butryna, Alena, Ramachandran, Deepak, Dvijotham, Krishnamurthy, Pavlick, Ellie, Rajakumar, Ravi
Image generation models are poised to become ubiquitous in a range of applications. These models are often fine-tuned and evaluated using human quality judgments that assume a universal standard, failing to consider the subjectivity of such tasks. To investigate how to quantify subjectivity, and the scale of its impact, we measure how assessments differ among human annotators across different use cases. Simulating the effects of ordinarily latent elements of annotators subjectivity, we contrive a set of motivations (t-shirt graphics, presentation visuals, and phone background images) to contextualize a set of crowdsourcing tasks. Our results show that human evaluations of images vary within individual contexts and across combinations of contexts. Three key factors affecting this subjectivity are image appearance, image alignment with text, and representation of objects mentioned in the text. Our study highlights the importance of taking individual users and contexts into account, both when building and evaluating generative models
- North America > United States > Tennessee > Knox County > Knoxville (0.14)
- North America > United States > California > Santa Clara County > Mountain View (0.05)
- North America > United States > Virginia (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine (0.68)
- Information Technology (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
A semantics-driven methodology for high-quality image annotation
Giunchiglia, Fausto, Bagchi, Mayukh, Diao, Xiaolei
Recent work in Machine Learning and Computer Vision has highlighted the presence of various types of systematic flaws inside ground truth object recognition benchmark datasets. Our basic tenet is that these flaws are rooted in the many-to-many mappings which exist between the visual information encoded in images and the intended semantics of the labels annotating them. The net consequence is that the current annotation process is largely under-specified, thus leaving too much freedom to the subjective judgment of annotators. In this paper, we propose vTelos, an integrated Natural Language Processing, Knowledge Representation, and Computer Vision methodology whose main goal is to make explicit the (otherwise implicit) intended annotation semantics, thus minimizing the number and role of subjective choices. A key element of vTelos is the exploitation of the WordNet lexico-semantic hierarchy as the main means for providing the meaning of natural language labels and, as a consequence, for driving the annotation of images based on the objects and the visual properties they depict. The methodology is validated on images populating a subset of the ImageNet hierarchy.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- Asia (0.04)
Toward Real-Time Image Annotation Using Marginalized Coupled Dictionary Learning
Roostaiyan, Seyed Mahdi, Hosseini, Mohammad Mehdi, Kashani, Mahya Mohammadi, Amiri, S. Hamid
In most image retrieval systems, images include various high-level semantics, called tags or annotations. Virtually all the state-of-the-art image annotation methods that handle imbalanced labeling are search-based techniques which are time-consuming. In this paper, a novel coupled dictionary learning approach is proposed to learn a limited number of visual prototypes and their corresponding semantics simultaneously. This approach leads to a real-time image annotation procedure. Another contribution of this paper is that utilizes a marginalized loss function instead of the squared loss function that is inappropriate for image annotation with imbalanced labels. We have employed a marginalized loss function in our method to leverage a simple and effective method of prototype updating. Meanwhile, we have introduced ${\ell}_1$ regularization on semantic prototypes to preserve the sparse and imbalanced nature of labels in learned semantic prototypes. Finally, comprehensive experimental results on various datasets demonstrate the efficiency of the proposed method for image annotation tasks in terms of accuracy and time. The reference implementation is publicly available on https://github.com/hamid-amiri/MCDL-Image-Annotation.
Learning Hybrid Models for Image Annotation with Partially Labeled Data
Extensive labeled data for image annotation systems, which learn to assign class labels to image regions, is difficult to obtain. We explore a hybrid model framework for utilizing partially labeled data that integrates a generative topic model for image appearance with discriminative label prediction. We propose three alternative formulations for imposing a spatial smoothness prior on the image labels. Tests of the new models and some baseline approaches on two real image datasets demonstrate the effectiveness of incorporating the latent structure.
Why radiologists in the healthcare sector will gain from AI
In radiology field, talk of artificial intelligence (AI) has become commonplace. Many think AI, also known as machine learning or deep learning, may improve radiologists' operations, enable quantitative radiology, and help identify genetic markers. Radiologists are very active medical specialists. Any errors would be unaffordable. They must communicate with a broad spectrum of referring medical professionals, including neurologists, urologists, orthopedic specialists, and more.
- Health & Medicine > Nuclear Medicine (1.00)
- Health & Medicine > Diagnostic Medicine > Imaging (1.00)
What is Data Annotation and What are its Advantages?
AI and machine learning is one the fastest growing technology brining unbelievable innovations providing the advantages to different fields globally. And to create such automated applications or machines, huge amount of training data sets is required. And to create such data sets, image annotation technique is used to make the objects recognizable to computer vision for machine learning. And this annotation process is benefiting not only the AI filed but also providing advantages to other stakeholders. Here we will discuss about the advantages of data annotation in various fields.
RF-Annotate: Automatic RF-Supervised Image Annotation of Common Objects in Context
Wireless tags are increasingly used to track and identify common items of interest such as retail goods, food, medicine, clothing, books, documents, keys, equipment, and more. At the same time, there is a need for labelled visual data featuring such items for the purpose of training object detection and recognition models for robots operating in homes, warehouses, stores, libraries, pharmacies, and so on. In this paper, we ask: can we leverage the tracking and identification capabilities of such tags as a basis for a large-scale automatic image annotation system for robotic perception tasks? We present RF-Annotate, a pipeline for autonomous pixel-wise image annotation which enables robots to collect labelled visual data of objects of interest as they encounter them within their environment. Our pipeline uses unmodified commodity RFID readers and RGB-D cameras, and exploits arbitrary small-scale motions afforded by mobile robotic platforms to spatially map RFIDs to corresponding objects in the scene. Our only assumption is that the objects of interest within the environment are pre-tagged with inexpensive battery-free RFIDs costing 3-15 cents each. We demonstrate the efficacy of our pipeline on several RGB-D sequences of tabletop scenes featuring common objects in a variety of indoor environments.
- North America > United States > New York (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > Alameda County > Oakland (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)